640 research outputs found

    Enabling parallelism and optimizations in data mining algorithms for power-law data

    Get PDF
    Today's data mining tasks aim to extract meaningful information from a large amount of data in a reasonable time mainly via means of --- a) algorithmic advances, such as fast approximate algorithms and efficient learning algorithms, and b) architectural advances, such as machines with massive compute capacity involving distributed multi-core processors and high throughput accelerators. For current and future generation processors, parallel algorithms are critical for fully utilizing computing resources. Furthermore, exploiting data properties for performance gain becomes crucial for data mining applications. In this work, we focus our attention on power-law behavior –-- a common property found in a large class of data, such as text data, internet traffic, and click-stream data. Specifically, we address the following questions in the context of power-law data: How well do the critical data mining algorithms of current interest fit with today's parallel architectures? Which algorithmic and mapping opportunities can be leveraged to further improve performance?, and What are the relative challenges and gains for such approaches? Specifically, we first investigate the suitability of the "frequency estimation" problem for GPU-scale parallelism. Sketching algorithms are a popular choice for this task due to their desirable trade-off between estimation accuracy and space-time efficiency. However, most of the past work on sketch-based frequency estimation focused on CPU implementations. In our work, we propose a novel approach for sketches, which exploits the natural skewness in the power-law data to efficiently utilize the massive amounts of parallelism in modern GPUs. Next, we explore the problem of "identifying top-K frequent elements" for distributed data streams on modern distributed settings with both multi-core and multi-node CPU parallelism. Sketch-based approaches, such as Count-Min Sketch (CMS) with top-K heap, have an excellent update time but lacks the important property of reducibility, which is needed for exploiting data parallelism. On the other end, the popular Frequent Algorithm (FA) leads to reducible summaries, but its update costs are high. Our approach Topkapi, gives the best of both worlds, i.e., it is reducible like FA and has an efficient update time similar to CMS. For power-law data, Topkapi possesses strong theoretical guarantees and leads to significant performance gains, relative to past work. Finally, we study Word2Vec, a popular word embedding method widely used in Machine learning and Natural Language Processing applications, such as machine translation, sentiment analysis, and query answering. This time, we target Single Instruction Multiple Data (SIMD) parallelism. With the increasing vector lengths in commodity CPUs, such as AVX-512 with a vector length of 512 bits, efficient vector processing unit utilization becomes a major performance game-changer. By employing a static multi-version code generation strategy coupled with an algorithmic approximation based on the power-law frequency distribution of words, we achieve significant reductions in training time relative to the state-of-the-art.Ph.D

    Molecular cloud formation by compression of magnetized turbulent gas subjected to radiative cooling

    Get PDF
    Complex turbulent motions of magnetized gas are ubiquitous in the interstellar medium. The source of this turbulence, however, is still poorly understood. Previous work suggests that compression caused by supernova shockwaves, gravity, or cloud collisions, may drive the turbulence to some extent. In this work, we present three-dimensional (3D) magnetohydrodynamic (MHD) simulations of contraction in turbulent, magnetized clouds from the warm neutral medium (WNM) of the ISM to the formation of cold dense molecular clouds, including radiative heating and cooling. We study different contraction rates and find that observed molecular cloud properties, such as the temperature, density, Mach number, and magnetic field strength, and their respective scaling relations, are best reproduced when the contraction rate equals the turbulent turnover rate. In contrast, if the contraction rate is significantly larger (smaller) than the turnover rate, the compression drives too much (too little) turbulence, producing unrealistic cloud properties. We find that the density probability distribution function evolves from a double log-normal representing the two-phase ISM, to a skewed, single log-normal in the dense, cold phase. For purely hydrodynamical simulations, we find that the effective driving parameter of contracting cloud turbulence is natural to mildly compressive (\mbox{b∌0.4b\sim0.4--0.50.5}), while for MHD turbulence, we find \mbox{b∌0.3b\sim0.3--0.40.4}, i.e., solenoidal to naturally mixed. Overall, the physical properties of the simulated clouds that contract at a rate equal to the turbulent turnover rate, indicate that large-scale contraction may explain the origin and evolution of turbulence in the ISM.Comment: 18 pages, 9 figures. Accepted for publication in MNRA

    A self-gravity module for the PLUTO code

    Full text link
    We present a novel implementation of an iterative solver for the solution of the Poisson equation in the PLUTO code for astrophysical fluid dynamics. Our solver relies on a relaxation method in which convergence is sought as the steady-state solution of a parabolic equation, whose time-discretization is governed by the \textit{Runge-Kutta-Legendre} (RKL) method. Our findings indicate that the RKL-based Poisson solver, which is both fully parallel and rapidly convergent, has the potential to serve as a practical alternative to conventional iterative solvers such as the \textit{Gauss-Seidel} (GS) and \textit{successive over-relaxation} (SOR) methods. Additionally, it can mitigate some of the drawbacks of these traditional techniques. We incorporate our algorithm into a multigrid solver to provide a simple and efficient gravity solver that can be used to obtain the gravitational potentials in self-gravitational hydrodynamics. We test our implementation against a broad range of standard self-gravitating astrophysical problems designed to examine different aspects of the code. We demonstrate that the results match excellently with the analytical predictions (when available), and the findings of similar previous studies.Comment: Submitted to ApJS. Comments are welcom

    Modelling observable signatures of jet-ISM interaction: thermal emission and gas kinematics

    Get PDF
    Relativistic jets are believed to have a substantial impact on the gas dynamics and evolution of the interstellar medium (ISM) of their host galaxies. In this paper, we aim to draw a link between the simulations and the observable signatures of jet-ISM interactions by analyzing the emission morphology and gas kinematics resulting from jet-induced shocks in simulated disc and spherical systems. We find that the jet-induced laterally expanding forward shock of the energy bubble sweeping through the ISM causes large-scale outflows, creating shocked emission and high-velocity dispersion in the entire nuclear regions (∌2\sim2 kpcs) of their hosts. The jetted systems exhibit larger velocity widths (> 800 km/s), broader Position-Velocity maps and distorted symmetry in the disc's projected velocities than systems without a jet. We also investigate the above quantities at different inclination angles of the observer with respect to the galaxy. Jets inclined to the gas disc of its host are found to be confined for longer times, and consequently couple more strongly with the disc gas. This results in prominent shocked emission and high-velocity widths, not only along the jet's path, but also in the regions perpendicular to them. Strong interaction of the jet with a gas disc can also distort its morphology. However, after the jets escape their initial confinement, the jet-disc coupling is weakened, thereby lowering the shocked emission and velocity widths.Comment: Matches the Published versio

    Search for new particles in events with energetic jets and large missing transverse momentum in proton-proton collisions at root s=13 TeV

    Get PDF
    A search is presented for new particles produced at the LHC in proton-proton collisions at root s = 13 TeV, using events with energetic jets and large missing transverse momentum. The analysis is based on a data sample corresponding to an integrated luminosity of 101 fb(-1), collected in 2017-2018 with the CMS detector. Machine learning techniques are used to define separate categories for events with narrow jets from initial-state radiation and events with large-radius jets consistent with a hadronic decay of a W or Z boson. A statistical combination is made with an earlier search based on a data sample of 36 fb(-1), collected in 2016. No significant excess of events is observed with respect to the standard model background expectation determined from control samples in data. The results are interpreted in terms of limits on the branching fraction of an invisible decay of the Higgs boson, as well as constraints on simplified models of dark matter, on first-generation scalar leptoquarks decaying to quarks and neutrinos, and on models with large extra dimensions. Several of the new limits, specifically for spin-1 dark matter mediators, pseudoscalar mediators, colored mediators, and leptoquarks, are the most restrictive to date.Peer reviewe

    MUSiC : a model-unspecific search for new physics in proton-proton collisions at root s=13TeV

    Get PDF
    Results of the Model Unspecific Search in CMS (MUSiC), using proton-proton collision data recorded at the LHC at a centre-of-mass energy of 13 TeV, corresponding to an integrated luminosity of 35.9 fb(-1), are presented. The MUSiC analysis searches for anomalies that could be signatures of physics beyond the standard model. The analysis is based on the comparison of observed data with the standard model prediction, as determined from simulation, in several hundred final states and multiple kinematic distributions. Events containing at least one electron or muon are classified based on their final state topology, and an automated search algorithm surveys the observed data for deviations from the prediction. The sensitivity of the search is validated using multiple methods. No significant deviations from the predictions have been observed. For a wide range of final state topologies, agreement is found between the data and the standard model simulation. This analysis complements dedicated search analyses by significantly expanding the range of final states covered using a model independent approach with the largest data set to date to probe phase space regions beyond the reach of previous general searches.Peer reviewe

    Measurement of prompt open-charm production cross sections in proton-proton collisions at root s=13 TeV

    Get PDF
    The production cross sections for prompt open-charm mesons in proton-proton collisions at a center-of-mass energy of 13TeV are reported. The measurement is performed using a data sample collected by the CMS experiment corresponding to an integrated luminosity of 29 nb(-1). The differential production cross sections of the D*(+/-), D-+/-, and D-0 ((D) over bar (0)) mesons are presented in ranges of transverse momentum and pseudorapidity 4 < p(T) < 100 GeV and vertical bar eta vertical bar < 2.1, respectively. The results are compared to several theoretical calculations and to previous measurements.Peer reviewe

    Combined searches for the production of supersymmetric top quark partners in proton-proton collisions at root s=13 TeV

    Get PDF
    A combination of searches for top squark pair production using proton-proton collision data at a center-of-mass energy of 13 TeV at the CERN LHC, corresponding to an integrated luminosity of 137 fb(-1) collected by the CMS experiment, is presented. Signatures with at least 2 jets and large missing transverse momentum are categorized into events with 0, 1, or 2 leptons. New results for regions of parameter space where the kinematical properties of top squark pair production and top quark pair production are very similar are presented. Depending on themodel, the combined result excludes a top squarkmass up to 1325 GeV for amassless neutralino, and a neutralinomass up to 700 GeV for a top squarkmass of 1150 GeV. Top squarks with masses from 145 to 295 GeV, for neutralino masses from 0 to 100 GeV, with a mass difference between the top squark and the neutralino in a window of 30 GeV around the mass of the top quark, are excluded for the first time with CMS data. The results of theses searches are also interpreted in an alternative signal model of dark matter production via a spin-0 mediator in association with a top quark pair. Upper limits are set on the cross section for mediator particle masses of up to 420 GeV

    Development and validation of HERWIG 7 tunes from CMS underlying-event measurements

    Get PDF
    This paper presents new sets of parameters (“tunes”) for the underlying-event model of the HERWIG7 event generator. These parameters control the description of multiple-parton interactions (MPI) and colour reconnection in HERWIG7, and are obtained from a fit to minimum-bias data collected by the CMS experiment at s=0.9, 7, and 13Te. The tunes are based on the NNPDF 3.1 next-to-next-to-leading-order parton distribution function (PDF) set for the parton shower, and either a leading-order or next-to-next-to-leading-order PDF set for the simulation of MPI and the beam remnants. Predictions utilizing the tunes are produced for event shape observables in electron-positron collisions, and for minimum-bias, inclusive jet, top quark pair, and Z and W boson events in proton-proton collisions, and are compared with data. Each of the new tunes describes the data at a reasonable level, and the tunes using a leading-order PDF for the simulation of MPI provide the best description of the dat
    • 

    corecore